12 research outputs found

    DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores

    Full text link
    We present our framework DKVF that enables one to quickly prototype and evaluate new protocols for key-value stores and compare them with existing protocols based on selected benchmarks. Due to limitations of CAP theorem, new protocols must be developed that achieve the desired trade-off between consistency and availability for the given application at hand. Hence, both academic and industrial communities focus on developing new protocols that identify a different (and hopefully better in one or more aspect) point on this trade-off curve. While these protocols are often based on a simple intuition, evaluating them to ensure that they indeed provide increased availability, consistency, or performance is a tedious task. Our framework, DKVF, enables one to quickly prototype a new protocol as well as identify how it performs compared to existing protocols for pre-specified benchmarks. Our framework relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We demonstrate DKVF by implementing four existing protocols --eventual consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the performance of these protocols against different loading conditions. We find that the performance is similar to our implementation of these protocols from scratch. And, the comparison of these protocols is consistent with what has been reported in the literature. Moreover, implementation of these protocols was much more natural as we only needed to translate the pseudocode into Java (and add the necessary error handling). Hence, it was possible to achieve this in just 1-2 days per protocol. Finally, our framework is extensible. It is possible to replace individual components in the framework (e.g., the storage component)

    Auditable Restoration of Distributed Programs

    Full text link
    We focus on a protocol for auditable restoration of distributed systems. The need for such protocol arises due to conflicting requirements (e.g., access to the system should be restricted but emergency access should be provided). One can design such systems with a tamper detection approach (based on the intuition of "break the glass door"). However, in a distributed system, such tampering, which are denoted as auditable events, is visible only for a single node. This is unacceptable since the actions they take in these situations can be different than those in the normal mode. Moreover, eventually, the auditable event needs to be cleared so that system resumes the normal operation. With this motivation, in this paper, we present a protocol for auditable restoration, where any process can potentially identify an auditable event. Whenever a new auditable event occurs, the system must reach an "auditable state" where every process is aware of the auditable event. Only after the system reaches an auditable state, it can begin the operation of restoration. Although any process can observe an auditable event, we require that only "authorized" processes can begin the task of restoration. Moreover, these processes can begin the restoration only when the system is in an auditable state. Our protocol is self-stabilizing and has bounded state space. It can effectively handle the case where faults or auditable events occur during the restoration protocol. Moreover, it can be used to provide auditable restoration to other distributed protocol.Comment: 10 page

    Ensuring Average Recovery with Adversarial Scheduler

    Get PDF
    In this paper, we focus on revising a given program so that the average recovery time in the presence of an adversarial scheduler is bounded by a given threshold lambda. Specifically, we consider the scenario where the fault (or other unexpected action) perturbs the program to a state that is outside its set of legitimate states. Starting from this state, the program executes its actions/transitions to recover to legitimate states. However, the adversarial scheduler can force the program to reach one illegitimate state that requires a longer recovery time. To ensure that the average recovery time is less than lambda, we need to remove certain transitions/behaviors. We show that achieving this average response time while removing minimum transitions is NP-hard. In other words, there is a tradeoff between the time taken to synthesize the program and the transitions preserved to reduce the average convergence time. We present six different heuristics and evaluate this tradeoff with case studies. Finally, we note that the average convergence time considered here requires formalization of hyperproperties. Hence, this work also demonstrates feasibility of adding (certain) hyperproperties to an existing program

    Automatic Addition of Fault-Tolerance in Presence of Unchangeable Environment Actions †

    No full text
    We focus on the problem of adding fault-tolerance to an existing concurrent protocol in the presence of unchangeable environment actions. Such unchangeable actions occur in cases where a subset of components/processes cannot be modified since they represent third-party components or are constrained by physical laws. These actions differ from faults in that they are (1) simultaneously collaborative and disruptive, (2) essential for satisfying the specification and (3) possibly non-terminating. Hence, if these actions are modeled as faults while adding fault-tolerance, it causes existing model repair algorithms to declare failure to add fault-tolerance. We present a set of algorithms for adding stabilization and fault-tolerance for programs that run in the presence of environment actions. We prove the soundness, completeness and the complexity of our algorithms. We have implemented all of our algorithms using symbolic techniques in Java. The experimental results of our algorithms for various examples are also provided
    corecore